{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "cb212650-86b7-4298-8bbd-c20a5227fbf0",
   "metadata": {},
   "source": [
    "# Advanced RAG with temporal filters using LlamaIndex and KDB.AI vector store\n",
    "\n",
    "##### Note: This example requires a KDB.AI endpoint and API key. Sign up for a free [KDB.AI account](https://kdb.ai/get-started).\n",
    "\n",
    "> [KDB.AI](https://kdb.ai/) is a powerful knowledge-based vector database and search engine that allows you to build scalable, reliable AI applications, using real-time data, by providing advanced search, recommendation and personalization.\n",
    "\n",
    "This example demonstrates how to use KDB.AI to run semantic search, summarization and analysis of financial regulations around some specific moment in time.\n",
    "\n",
    "To access your end point and API keys, sign up to KDB.AI here.\n",
    "\n",
    "To set up your development environment, follow the instructions on the KDB.AI pre-requisites page.\n",
    "\n",
    "The following examples demonstrate some of the ways you can interact with KDB.AI through LlamaIndex."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "32f36ddd-aa4d-4284-a236-be3028758ae2",
   "metadata": {},
   "source": [
    "## Install dependencies with Pip\n",
    "\n",
    "In order to successfully run this sample, note the following steps depending on where you are running this notebook:\n",
    "\n",
    "-***Run Locally / Private Environment:*** The [Setup](https://github.com/KxSystems/kdbai-samples/blob/main/README.md#setup) steps in the repository's `README.md` will guide you on prerequisites and how to run this with Jupyter.\n",
    "\n",
    "\n",
    "-***Colab / Hosted Environment:*** Open this notebook in Colab and run through the cells."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2QITjsy5Jois",
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install llama-index llama-index-llms-openai llama-index-embeddings-openai llama-index-readers-file llama-index-vector-stores-kdbai\n",
    "!pip install kdbai_client pandas"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "68ba14b7-1208-4494-93ec-ce1930b7bf5b",
   "metadata": {},
   "source": [
    "## Import dependencies"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f4ca21f7-819c-4abb-8479-e7c6f3175f34",
   "metadata": {},
   "outputs": [],
   "source": [
    "from getpass import getpass\n",
    "import re\n",
    "import os\n",
    "import shutil\n",
    "import time\n",
    "import urllib\n",
    "import datetime\n",
    "\n",
    "import pandas as pd\n",
    "\n",
    "from llama_index.core import (\n",
    "    Settings,\n",
    "    SimpleDirectoryReader,\n",
    "    StorageContext,\n",
    "    VectorStoreIndex,\n",
    ")\n",
    "from llama_index.core.node_parser import SentenceSplitter\n",
    "from llama_index.core.retrievers import VectorIndexRetriever\n",
    "from llama_index.embeddings.openai import OpenAIEmbedding\n",
    "from llama_index.llms.openai import OpenAI\n",
    "from llama_index.vector_stores.kdbai import KDBAIVectorStore\n",
    "\n",
    "import kdbai_client as kdbai\n",
    "\n",
    "OUTDIR = \"pdf\"\n",
    "RESET = True"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9ef2f90f",
   "metadata": {},
   "source": [
    "#### Set OpenAI API key and choose the LLM and Embedding model to use:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7WV4ydgSRlnV",
   "metadata": {},
   "outputs": [],
   "source": [
    "# os.environ[\"OPENAI_API_KEY\"] = getpass(\"OpenAI API key: \")\n",
    "os.environ[\"OPENAI_API_KEY\"] = (\n",
    "    os.environ[\"OPENAI_API_KEY\"]\n",
    "    if \"OPENAI_API_KEY\" in os.environ\n",
    "    else getpass(\"OpenAI API Key: \")\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "15d1eaf6-8743-4c5b-a2dd-81e6101a4c33",
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "from getpass import getpass\n",
    "\n",
    "# Set OpenAI API\n",
    "if \"OPENAI_API_KEY\" in os.environ:\n",
    "    KDBAI_API_KEY = os.environ[\"OPENAI_API_KEY\"]\n",
    "else:\n",
    "    # Prompt the user to enter the API key\n",
    "    OPENAI_API_KEY = getpass(\"OPENAI API KEY: \")\n",
    "    # Save the API key as an environment variable for the current session\n",
    "    os.environ[\"OPENAI_API_KEY\"] = OPENAI_API_KEY"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "17e19f9f",
   "metadata": {},
   "outputs": [],
   "source": [
    "EMBEDDING_MODEL = \"text-embedding-3-small\"\n",
    "GENERATION_MODEL = \"gpt-4o-mini\"\n",
    "\n",
    "llm = OpenAI(model=GENERATION_MODEL)\n",
    "embed_model = OpenAIEmbedding(model=EMBEDDING_MODEL)\n",
    "\n",
    "Settings.llm = llm\n",
    "Settings.embed_model = embed_model"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2ca073f5-3d84-4b0e-8684-1396c1311fb8",
   "metadata": {},
   "source": [
    "## Create KDB.AI session and table"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "edf7d774",
   "metadata": {},
   "outputs": [],
   "source": [
    "# vector DB imports\n",
    "import os\n",
    "from getpass import getpass\n",
    "import kdbai_client as kdbai\n",
    "import time"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "816fa95c",
   "metadata": {},
   "source": [
    "##### Option 1. KDB.AI Cloud\n",
    "\n",
    "To use KDB.AI Cloud, you will need two session details - a URL endpoint and an API key.\n",
    "To get these you can sign up for free [here](https://trykdb.kx.com/kdbai/signup).\n",
    "\n",
    "You can connect to a KDB.AI Cloud session using `kdbai.Session` and passing the session URL endpoint and API key details from your KDB.AI Cloud portal.\n",
    "\n",
    "If the environment variables `KDBAI_ENDPOINTS` and `KDBAI_API_KEY` exist on your system containing your KDB.AI Cloud portal details, these variables will automatically be used to connect.\n",
    "If these do not exist, it will prompt you to enter your KDB.AI Cloud portal session URL endpoint and API key details."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3991d719",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Set up KDB.AI endpoint and API key\n",
    "KDBAI_ENDPOINT = (\n",
    "    os.environ[\"KDBAI_ENDPOINT\"]\n",
    "    if \"KDBAI_ENDPOINT\" in os.environ\n",
    "    else input(\"KDB.AI endpoint: \")\n",
    ")\n",
    "KDBAI_API_KEY = (\n",
    "    os.environ[\"KDBAI_API_KEY\"]\n",
    "    if \"KDBAI_API_KEY\" in os.environ\n",
    "    else getpass(\"KDB.AI API key: \")\n",
    ")\n",
    "\n",
    "session = kdbai.Session(endpoint=KDBAI_ENDPOINT, api_key=KDBAI_API_KEY)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bf750140",
   "metadata": {},
   "source": [
    "##### Option 2. KDB.AI Server\n",
    "\n",
    "To use KDB.AI Server, you will need download and run your own container.\n",
    "To do this, you will first need to sign up for free [here](https://trykdb.kx.com/kdbaiserver/signup/).\n",
    "\n",
    "You will receive an email with the required license file and bearer token needed to download your instance.\n",
    "Follow instructions in the signup email to get your session up and running.\n",
    "\n",
    "Once the [setup steps](https://code.kx.com/kdbai/gettingStarted/kdb-ai-server-setup.html) are complete you can then connect to your KDB.AI Server session using `kdbai.Session` and passing your local endpoint."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "77e9b64c",
   "metadata": {},
   "outputs": [],
   "source": [
    "# session = kdbai.Session()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6d392b5e",
   "metadata": {},
   "source": [
    "### Create the schema for your KDB.AI table\n",
    "\n",
    "***!!! Note:*** The 'dims' parameter in the embedding column must reflect the output dimensions of the embedding model you choose.\n",
    "\n",
    "\n",
    "- OpenAI 'text-embedding-3-small' outputs 1536 dimensions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9104c851",
   "metadata": {},
   "outputs": [],
   "source": [
    "schema = [\n",
    "    {\"name\": \"document_id\", \"type\": \"bytes\"},\n",
    "    {\"name\": \"text\", \"type\": \"bytes\"},\n",
    "    {\"name\": \"embeddings\", \"type\": \"float32s\"},\n",
    "    {\"name\": \"title\", \"type\": \"str\"},\n",
    "    {\"name\": \"publication_date\", \"type\": \"datetime64[ns]\"},\n",
    "]\n",
    "\n",
    "\n",
    "indexFlat = {\n",
    "    \"name\": \"flat_index\",\n",
    "    \"type\": \"flat\",\n",
    "    \"column\": \"embeddings\",\n",
    "    \"params\": {\"dims\": 1536, \"metric\": \"L2\"},\n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "df598cbd",
   "metadata": {},
   "outputs": [],
   "source": [
    "KDBAI_TABLE_NAME = \"reports\"\n",
    "database = session.database(\"default\")\n",
    "\n",
    "# First ensure the table does not already exist\n",
    "for table in database.tables:\n",
    "    if table.name == KDBAI_TABLE_NAME:\n",
    "        table.drop()\n",
    "        break\n",
    "\n",
    "# Create the table\n",
    "table = database.create_table(\n",
    "    KDBAI_TABLE_NAME, schema=schema, indexes=[indexFlat]\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a208460c-2c87-4a3f-9926-65d6dcc4b45d",
   "metadata": {},
   "source": [
    "## Financial reports urls and metadata"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6143a9ec-7d48-4f61-bb86-a1de427a0279",
   "metadata": {},
   "outputs": [],
   "source": [
    "INPUT_URLS = [\n",
    "    \"https://www.govinfo.gov/content/pkg/PLAW-106publ102/pdf/PLAW-106publ102.pdf\",\n",
    "    \"https://www.govinfo.gov/content/pkg/PLAW-111publ203/pdf/PLAW-111publ203.pdf\",\n",
    "]\n",
    "\n",
    "METADATA = {\n",
    "    \"pdf/PLAW-106publ102.pdf\": {\n",
    "        \"title\": \"GRAMM–LEACH–BLILEY ACT, 1999\",\n",
    "        \"publication_date\": pd.to_datetime(\"1999-11-12\"),\n",
    "    },\n",
    "    \"pdf/PLAW-111publ203.pdf\": {\n",
    "        \"title\": \"DODD-FRANK WALL STREET REFORM AND CONSUMER PROTECTION ACT, 2010\",\n",
    "        \"publication_date\": pd.to_datetime(\"2010-07-21\"),\n",
    "    },\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e1e6c6c5-f151-4c01-a9a1-ab1d540402eb",
   "metadata": {},
   "source": [
    "## Download PDF files locally"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3ee36757-b0b7-4478-a71a-601220048a05",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Downloading https://www.govinfo.gov/content/pkg/PLAW-106publ102/pdf/PLAW-106publ102.pdf...\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Downloading https://www.govinfo.gov/content/pkg/PLAW-111publ203/pdf/PLAW-111publ203.pdf...\n",
      "CPU times: user 52.6 ms, sys: 1.2 ms, total: 53.8 ms\n",
      "Wall time: 7.86 s\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "\n",
    "CHUNK_SIZE = 512 * 1024\n",
    "\n",
    "\n",
    "def download_file(url):\n",
    "    print(\"Downloading %s...\" % url)\n",
    "    out = os.path.join(OUTDIR, os.path.basename(url))\n",
    "    try:\n",
    "        response = urllib.request.urlopen(url)\n",
    "    except urllib.error.URLError as e:\n",
    "        logging.exception(\"Failed to download %s !\" % url)\n",
    "    else:\n",
    "        with open(out, \"wb\") as f:\n",
    "            while True:\n",
    "                chunk = response.read(CHUNK_SIZE)\n",
    "                if chunk:\n",
    "                    f.write(chunk)\n",
    "                else:\n",
    "                    break\n",
    "    return out\n",
    "\n",
    "\n",
    "if RESET:\n",
    "    if os.path.exists(OUTDIR):\n",
    "        shutil.rmtree(OUTDIR)\n",
    "    os.mkdir(OUTDIR)\n",
    "\n",
    "    local_files = [download_file(x) for x in INPUT_URLS]\n",
    "    local_files[:10]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "10d52ad8-b9bd-459e-9ce4-c370982a149f",
   "metadata": {},
   "source": [
    "## Load local PDF files with LlamaIndex"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9714258b-f58a-4964-9d3f-0298a98b87e0",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CPU times: user 8.22 s, sys: 9.04 ms, total: 8.23 s\n",
      "Wall time: 8.23 s\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "994"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "%%time\n",
    "\n",
    "\n",
    "def get_metadata(filepath):\n",
    "    return METADATA[filepath]\n",
    "\n",
    "\n",
    "documents = SimpleDirectoryReader(\n",
    "    input_files=local_files,\n",
    "    file_metadata=get_metadata,\n",
    ")\n",
    "\n",
    "docs = documents.load_data()\n",
    "len(docs)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2f3ba953-4034-4421-acf0-dbac33dfed67",
   "metadata": {},
   "source": [
    "## Setup LlamaIndex RAG pipeline using KDB.AI vector store"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4ed27c4b-a979-4d4f-9f17-7c6bd9844d9a",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CPU times: user 3.67 s, sys: 31.9 ms, total: 3.7 s\n",
      "Wall time: 22.3 s\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "\n",
    "# llm = OpenAI(temperature=0, model=LLM)\n",
    "vector_store = KDBAIVectorStore(table)\n",
    "\n",
    "storage_context = StorageContext.from_defaults(vector_store=vector_store)\n",
    "index = VectorStoreIndex.from_documents(\n",
    "    docs,\n",
    "    storage_context=storage_context,\n",
    "    transformations=[SentenceSplitter(chunk_size=2048, chunk_overlap=0)],\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f5-Ilwz2UawR",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>document_id</th>\n",
       "      <th>text</th>\n",
       "      <th>embeddings</th>\n",
       "      <th>title</th>\n",
       "      <th>publication_date</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>b'272d7d24-c232-41b6-823e-27aa6203c100'</td>\n",
       "      <td>b'PUBLIC LAW 106\\xc2\\xb1102\\xc3\\x90NOV. 12, 19...</td>\n",
       "      <td>[0.034452137, 0.03166917, -0.011892043, 0.0184...</td>\n",
       "      <td>GRAMM–LEACH–BLILEY ACT, 1999</td>\n",
       "      <td>1999-11-12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>b'89e3f2ee-f5a6-4e40-bb81-0632f08341f0'</td>\n",
       "      <td>b\"113 STAT. 1338 PUBLIC LAW 106\\xc2\\xb1102\\xc3...</td>\n",
       "      <td>[0.02164333, 1.0030156e-05, 0.0028665832, 0.02...</td>\n",
       "      <td>GRAMM–LEACH–BLILEY ACT, 1999</td>\n",
       "      <td>1999-11-12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>b'56fbe82a-5458-4a4a-a5ed-026d9399151d'</td>\n",
       "      <td>b'113 STAT. 1339 PUBLIC LAW 106\\xc2\\xb1102\\xc3...</td>\n",
       "      <td>[0.01380091, 0.026945233, 0.02838467, 0.043132...</td>\n",
       "      <td>GRAMM–LEACH–BLILEY ACT, 1999</td>\n",
       "      <td>1999-11-12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>b'b6bf9e48-51b6-45d9-9259-b6346f93831f'</td>\n",
       "      <td>b'113 STAT. 1340 PUBLIC LAW 106\\xc2\\xb1102\\xc3...</td>\n",
       "      <td>[0.0070182937, 0.014063503, 0.026525516, 0.040...</td>\n",
       "      <td>GRAMM–LEACH–BLILEY ACT, 1999</td>\n",
       "      <td>1999-11-12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>b'f398b133-b4f5-4a34-94d1-9a97fdb658e5'</td>\n",
       "      <td>b\"113 STAT. 1341 PUBLIC LAW 106\\xc2\\xb1102\\xc3...</td>\n",
       "      <td>[0.025041763, 0.01968024, 0.030940715, 0.02899...</td>\n",
       "      <td>GRAMM–LEACH–BLILEY ACT, 1999</td>\n",
       "      <td>1999-11-12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>989</th>\n",
       "      <td>b'8e84d1d5-d87d-4351-b7eb-5d569fdb8d9c'</td>\n",
       "      <td>b'124 STAT. 2219 PUBLIC LAW 111\\xe2\\x80\\x93203...</td>\n",
       "      <td>[0.024505286, 0.015549232, 0.0536601, 0.028532...</td>\n",
       "      <td>DODD-FRANK WALL STREET REFORM AND CONSUMER PRO...</td>\n",
       "      <td>2010-07-21</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>990</th>\n",
       "      <td>b'0c47f590-050c-4374-bf8c-2a4502dc980f'</td>\n",
       "      <td>b'124 STAT. 2220 PUBLIC LAW 111\\xe2\\x80\\x93203...</td>\n",
       "      <td>[0.014071382, -0.0044553108, 0.03662071, 0.035...</td>\n",
       "      <td>DODD-FRANK WALL STREET REFORM AND CONSUMER PRO...</td>\n",
       "      <td>2010-07-21</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>991</th>\n",
       "      <td>b'63a2235f-d368-43b8-a1a9-a5a11d497245'</td>\n",
       "      <td>b'124 STAT. 2221 PUBLIC LAW 111\\xe2\\x80\\x93203...</td>\n",
       "      <td>[0.0005448305, 0.013075933, 0.044821188, 0.031...</td>\n",
       "      <td>DODD-FRANK WALL STREET REFORM AND CONSUMER PRO...</td>\n",
       "      <td>2010-07-21</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>992</th>\n",
       "      <td>b'bac4d75e-4867-4d89-a71e-09a6762bf3c4'</td>\n",
       "      <td>b'124 STAT. 2222 PUBLIC LAW 111\\xe2\\x80\\x93203...</td>\n",
       "      <td>[0.032077603, 0.016817383, 0.04507993, 0.03376...</td>\n",
       "      <td>DODD-FRANK WALL STREET REFORM AND CONSUMER PRO...</td>\n",
       "      <td>2010-07-21</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>993</th>\n",
       "      <td>b'e262e4da-f6e1-4b9d-9232-77fc3f0c81a7'</td>\n",
       "      <td>b'124 STAT. 2223 PUBLIC LAW 111\\xe2\\x80\\x93203...</td>\n",
       "      <td>[0.0387719, -0.025150038, 0.030345473, 0.04303...</td>\n",
       "      <td>DODD-FRANK WALL STREET REFORM AND CONSUMER PRO...</td>\n",
       "      <td>2010-07-21</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>994 rows × 5 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                                 document_id  \\\n",
       "0    b'272d7d24-c232-41b6-823e-27aa6203c100'   \n",
       "1    b'89e3f2ee-f5a6-4e40-bb81-0632f08341f0'   \n",
       "2    b'56fbe82a-5458-4a4a-a5ed-026d9399151d'   \n",
       "3    b'b6bf9e48-51b6-45d9-9259-b6346f93831f'   \n",
       "4    b'f398b133-b4f5-4a34-94d1-9a97fdb658e5'   \n",
       "..                                       ...   \n",
       "989  b'8e84d1d5-d87d-4351-b7eb-5d569fdb8d9c'   \n",
       "990  b'0c47f590-050c-4374-bf8c-2a4502dc980f'   \n",
       "991  b'63a2235f-d368-43b8-a1a9-a5a11d497245'   \n",
       "992  b'bac4d75e-4867-4d89-a71e-09a6762bf3c4'   \n",
       "993  b'e262e4da-f6e1-4b9d-9232-77fc3f0c81a7'   \n",
       "\n",
       "                                                  text  \\\n",
       "0    b'PUBLIC LAW 106\\xc2\\xb1102\\xc3\\x90NOV. 12, 19...   \n",
       "1    b\"113 STAT. 1338 PUBLIC LAW 106\\xc2\\xb1102\\xc3...   \n",
       "2    b'113 STAT. 1339 PUBLIC LAW 106\\xc2\\xb1102\\xc3...   \n",
       "3    b'113 STAT. 1340 PUBLIC LAW 106\\xc2\\xb1102\\xc3...   \n",
       "4    b\"113 STAT. 1341 PUBLIC LAW 106\\xc2\\xb1102\\xc3...   \n",
       "..                                                 ...   \n",
       "989  b'124 STAT. 2219 PUBLIC LAW 111\\xe2\\x80\\x93203...   \n",
       "990  b'124 STAT. 2220 PUBLIC LAW 111\\xe2\\x80\\x93203...   \n",
       "991  b'124 STAT. 2221 PUBLIC LAW 111\\xe2\\x80\\x93203...   \n",
       "992  b'124 STAT. 2222 PUBLIC LAW 111\\xe2\\x80\\x93203...   \n",
       "993  b'124 STAT. 2223 PUBLIC LAW 111\\xe2\\x80\\x93203...   \n",
       "\n",
       "                                            embeddings  \\\n",
       "0    [0.034452137, 0.03166917, -0.011892043, 0.0184...   \n",
       "1    [0.02164333, 1.0030156e-05, 0.0028665832, 0.02...   \n",
       "2    [0.01380091, 0.026945233, 0.02838467, 0.043132...   \n",
       "3    [0.0070182937, 0.014063503, 0.026525516, 0.040...   \n",
       "4    [0.025041763, 0.01968024, 0.030940715, 0.02899...   \n",
       "..                                                 ...   \n",
       "989  [0.024505286, 0.015549232, 0.0536601, 0.028532...   \n",
       "990  [0.014071382, -0.0044553108, 0.03662071, 0.035...   \n",
       "991  [0.0005448305, 0.013075933, 0.044821188, 0.031...   \n",
       "992  [0.032077603, 0.016817383, 0.04507993, 0.03376...   \n",
       "993  [0.0387719, -0.025150038, 0.030345473, 0.04303...   \n",
       "\n",
       "                                                 title publication_date  \n",
       "0                         GRAMM–LEACH–BLILEY ACT, 1999       1999-11-12  \n",
       "1                         GRAMM–LEACH–BLILEY ACT, 1999       1999-11-12  \n",
       "2                         GRAMM–LEACH–BLILEY ACT, 1999       1999-11-12  \n",
       "3                         GRAMM–LEACH–BLILEY ACT, 1999       1999-11-12  \n",
       "4                         GRAMM–LEACH–BLILEY ACT, 1999       1999-11-12  \n",
       "..                                                 ...              ...  \n",
       "989  DODD-FRANK WALL STREET REFORM AND CONSUMER PRO...       2010-07-21  \n",
       "990  DODD-FRANK WALL STREET REFORM AND CONSUMER PRO...       2010-07-21  \n",
       "991  DODD-FRANK WALL STREET REFORM AND CONSUMER PRO...       2010-07-21  \n",
       "992  DODD-FRANK WALL STREET REFORM AND CONSUMER PRO...       2010-07-21  \n",
       "993  DODD-FRANK WALL STREET REFORM AND CONSUMER PRO...       2010-07-21  \n",
       "\n",
       "[994 rows x 5 columns]"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "table.query()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0a0ca610-4038-41c2-a1f9-5c8af9d764ce",
   "metadata": {},
   "source": [
    "## Setup the LlamaIndex Query Engine"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f7d29b82-c1b8-4880-869e-dabc8bdedaa4",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CPU times: user 512 μs, sys: 23 μs, total: 535 μs\n",
      "Wall time: 550 μs\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "\n",
    "# Using gpt-4o-mini, the 128k tokens context size can take 100 pages.\n",
    "K = 15\n",
    "\n",
    "query_engine = index.as_query_engine(\n",
    "    similarity_top_k=K,\n",
    "    vector_store_kwargs={\n",
    "        \"index\": \"flat_index\",\n",
    "        \"filter\": [[\"<\", \"publication_date\", datetime.date(2008, 9, 15)]],\n",
    "        \"sort_columns\": \"publication_date\",\n",
    "    },\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e442d238-03cf-41d1-9ea4-9d435bb30278",
   "metadata": {},
   "source": [
    "## Before the 2008 crisis"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1d791e0e-67b0-4179-a4c4-a1f5fd6d765b",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The main financial regulation in the US before the 2008 financial crisis was the Gramm-Leach-Bliley Act, enacted in 1999. This act facilitated the affiliation among banks, securities firms, and insurance companies, effectively repealing parts of the Glass-Steagall Act, which had previously separated these financial services. The Gramm-Leach-Bliley Act aimed to enhance competition in the financial services industry by providing a framework for the integration of various financial institutions.\n",
      "CPU times: user 61.8 ms, sys: 0 ns, total: 61.8 ms\n",
      "Wall time: 4.24 s\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "\n",
    "result = query_engine.query(\n",
    "    \"\"\"\n",
    "    What was the main financial regulation in the US before the 2008 financial crisis ?\n",
    "    \"\"\"\n",
    ")\n",
    "print(result.response)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ee5c52ed-fc1a-4f4f-8cc4-efbb4e5a067f",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The Gramm-Leach-Bliley Act of 1999 aimed to enhance competition in the financial services industry by allowing affiliations among banks, securities firms, and insurance companies. Its strengths include the repeal of the Glass-Steagall Act, which had previously separated commercial banking from investment banking, thereby enabling financial institutions to diversify their services and potentially increase competition. This diversification could lead to more innovative financial products and services.\n",
      "\n",
      "However, the Act also has notable weaknesses. By allowing greater affiliations and reducing regulatory barriers, it may have contributed to the creation of \"too big to fail\" institutions, which posed systemic risks to the financial system. The lack of stringent oversight and the ability for financial holding companies to engage in a wide range of activities without adequate regulation may have led to excessive risk-taking. Additionally, the Act did not sufficiently address the complexities of modern financial products, such as derivatives, which played a significant role in the 2008 financial crisis.\n",
      "\n",
      "In summary, while the Gramm-Leach-Bliley Act aimed to foster competition and innovation in the financial sector, its regulatory framework may have inadvertently facilitated the conditions that led to the financial crisis, highlighting the need for a more robust regulatory approach to oversee the interconnectedness and risks within the financial system.\n",
      "CPU times: user 45.7 ms, sys: 255 μs, total: 46 ms\n",
      "Wall time: 21.9 s\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "\n",
    "result = query_engine.query(\n",
    "    \"\"\"\n",
    "    Is the Gramm-Leach-Bliley Act of 1999 enough to prevent the 2008 crisis. Search the document and explain its strenghts and weaknesses to regulate the US stock market.\n",
    "    \"\"\"\n",
    ")\n",
    "print(result.response)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5c6b52ee-2086-4e50-8e88-6ad6920cc8bc",
   "metadata": {},
   "source": [
    "## After the 2008 crisis"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "37753e54-959b-43a3-b596-7cef0e94d4ba",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CPU times: user 171 μs, sys: 0 ns, total: 171 μs\n",
      "Wall time: 175 μs\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "\n",
    "# Using gpt-4o-mini, the 128k tokens context size can take 100 pages.\n",
    "K = 15\n",
    "\n",
    "query_engine = index.as_query_engine(\n",
    "    similarity_top_k=K,\n",
    "    vector_store_kwargs={\n",
    "        \"index\": \"flat_index\",\n",
    "        \"filter\": [[\">=\", \"publication_date\", datetime.date(2008, 9, 15)]],\n",
    "        \"sort_columns\": \"publication_date\",\n",
    "    },\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "445ebab3-4431-4f75-a07d-c998c98b7cfd",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "On the 15th of September 2008, Lehman Brothers, a major global financial services firm, filed for bankruptcy. This event marked one of the largest bankruptcies in U.S. history and was a significant moment in the financial crisis of 2007-2008, leading to widespread panic in financial markets and contributing to the global economic downturn.\n",
      "CPU times: user 51.4 ms, sys: 0 ns, total: 51.4 ms\n",
      "Wall time: 3.6 s\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "\n",
    "result = query_engine.query(\n",
    "    \"\"\"\n",
    "    What happened on the 15th of September 2008 ?\n",
    "    \"\"\"\n",
    ")\n",
    "print(result.response)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a05c539e-85c1-4592-808b-07d68e68e032",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The new US financial regulation enacted after the 2008 crisis to increase market regulation and improve consumer sentiment is the Dodd-Frank Wall Street Reform and Consumer Protection Act, which was signed into law on July 21, 2010. This legislation aimed to promote financial stability, enhance accountability and transparency in the financial system, and protect consumers from abusive financial practices.\n",
      "CPU times: user 43.7 ms, sys: 0 ns, total: 43.7 ms\n",
      "Wall time: 4.55 s\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "\n",
    "result = query_engine.query(\n",
    "    \"\"\"\n",
    "    What was the new US financial regulation enacted after the 2008 crisis to increase the market regulation and to improve consumer sentiment ?\n",
    "    \"\"\"\n",
    ")\n",
    "print(result.response)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f06802cf-c241-4131-8a5d-529ea3933e59",
   "metadata": {},
   "source": [
    "## In depth analysis"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "67c2240b-7b0d-4bd8-8c19-fcf7e5ba429c",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CPU times: user 227 μs, sys: 10 μs, total: 237 μs\n",
      "Wall time: 243 μs\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "\n",
    "# Using gpt-4o-mini, the 128k tokens context size can take 100 pages.\n",
    "K = 20\n",
    "\n",
    "query_engine = index.as_query_engine(\n",
    "    similarity_top_k=K,\n",
    "    vector_store_kwargs={\n",
    "        \"index\": \"flat_index\",\n",
    "        \"sort_columns\": \"publication_date\",\n",
    "    },\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b5fcb92b-7e2f-4945-82c7-08bffd20a052",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The analysis of U.S. financial regulations before and after the 2008 financial crisis reveals significant changes aimed at preventing a recurrence of such a crisis. \n",
      "\n",
      "Before the crisis, the regulatory framework was characterized by a lack of comprehensive oversight, particularly for nonbank financial institutions. The regulatory environment allowed for excessive risk-taking, inadequate capital requirements, and insufficient transparency in financial transactions. This environment contributed to the housing bubble and the subsequent collapse of major financial institutions, leading to widespread economic turmoil.\n",
      "\n",
      "In response to the crisis, the Dodd-Frank Wall Street Reform and Consumer Protection Act of 2010 was enacted. This legislation introduced several key reforms:\n",
      "\n",
      "1. **Creation of the Financial Stability Oversight Council (FSOC)**: This body was established to monitor systemic risks and coordinate regulatory efforts across different financial sectors. It has the authority to recommend heightened standards and safeguards for financial activities that could pose risks to financial stability.\n",
      "\n",
      "2. **Enhanced Regulatory Oversight**: Dodd-Frank imposed stricter regulations on bank holding companies and nonbank financial companies, particularly those with significant assets. This includes requirements for stress testing, capital planning, and the submission of resolution plans to ensure orderly wind-downs in case of failure.\n",
      "\n",
      "3. **Consumer Protection Measures**: The establishment of the Consumer Financial Protection Bureau (CFPB) aimed to protect consumers from predatory lending practices and ensure transparency in financial products.\n",
      "\n",
      "4. **Volcker Rule**: This provision restricts proprietary trading by banks and limits their investments in hedge funds and private equity funds, thereby reducing conflicts of interest and excessive risk-taking.\n",
      "\n",
      "5. **Increased Transparency and Reporting Requirements**: Financial institutions are now required to disclose more information regarding their risk exposures and financial health, which enhances market discipline and investor confidence.\n",
      "\n",
      "The arguments for these reforms center around the need for a more resilient financial system that can withstand economic shocks. The reforms aim to address the systemic risks that were prevalent before the crisis, ensuring that financial institutions maintain adequate capital buffers and engage in prudent risk management practices.\n",
      "\n",
      "In conclusion, the regulatory landscape has shifted significantly since the 2008 crisis, with a focus on preventing excessive risk-taking, enhancing transparency, and protecting consumers. These measures are designed to create a more stable financial environment and mitigate the likelihood of future crises.\n",
      "CPU times: user 180 ms, sys: 437 μs, total: 180 ms\n",
      "Wall time: 10.5 s\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "\n",
    "result = query_engine.query(\n",
    "    \"\"\"\n",
    "    Analyse the US financial regulations before and after the 2008 crisis and produce a report of all related arguments to explain what happened, and to ensure that does not happen again.\n",
    "    Use both the provided context and your own knowledge but do mention explicitely which one you use.\n",
    "    \"\"\"\n",
    ")\n",
    "print(result.response)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e6604ca0",
   "metadata": {},
   "source": [
    "## Delete the KDB.AI Table\n",
    "\n",
    "Once finished with the table, it is best practice to drop it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8e1be781",
   "metadata": {},
   "outputs": [],
   "source": [
    "table.drop()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "381ca28b",
   "metadata": {},
   "source": [
    "#### Take Our Survey\n",
    "We hope you found this sample helpful! Your feedback is important to us, and we would appreciate it if you could take a moment to fill out our brief survey. Your input helps us improve our content.\n",
    "\n",
    "Take the [Survey](https://delighted.com/t/kWYXv316)\n"
   ]
  }
 ],
 "metadata": {
  "colab": {
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
